Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
CIEMPIESS_Spanish_Models_581h.zip | 2019-08-24 | 159.6 MB | |
README.txt | 2019-08-23 | 4.0 kB | |
LICENSE.txt | 2019-08-23 | 35.1 kB | |
Totals: 3 Items | 159.6 MB | 16 |
------------------------------------------------------------------------------------------------- The CIEMPIESS Spanish Models PocketSphinx Acoustic Models in Spanish made out of 581 hours of audio by Dr. Carlos Daniel Hernández Mena ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- PRESENTATION ------------------------------------------------------------------------------------------------- The CIEMPIESS Spanish Models are acoustic models designed to work with PocketSphinx. The 581 hours of audio recordings used to train the models come from many datasets by LDC (including all the CIEMPIESS corpus except the CIEMPIESS-TEST) and other sources collected by the social service program "Desarrollo de Tecnologías del Habla" and the CIEMPIESS-UNAM project. Both of them belonging to the "Univeridad Nacional Autónoma de México" (UNAM) in Mexico City. ------------------------------------------------------------------------------------------------- MODEL CHARACTERISTICS ------------------------------------------------------------------------------------------------- - Most the audio files used in the training stage contain clean speech. The training corpus mixes read and spontaneous speech in many accents of Spanish including accents from Mexico, Spain and Latin America. - The acoustic models are Continuous and Context Dependent (CD). 10,000 senones were used for its creation - The audio format of the training files is Microsoft WAV 16Khz@16bit mono. - The pronouncing dictionary contains more than 285,000 words. - The phonetic alphabet used in the pronouncing dictionary is called Mexbet. For more informatioin about Mexbet see www.ciempiess.org - The phonetic transcriptions used in the pronouncing dictionary were made using a G2P-tool called "fonetica3 library". For more information see www.ciempiess.org - The text used for language model come from many sources including Wikipedia, trascribed interviews and newspapers. - The language model was created using SRILM. ------------------------------------------------------------------------------------------------- TERMS OF USE ------------------------------------------------------------------------------------------------- The CIEMPIESS Spanish Models by Carlos Daniel Hernández Mena are free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. The CIEMPIESS Spanish Models were created by May, 2019. ------------------------------------------------------------------------------------------------- ACKNOWLEDGEMENTS ------------------------------------------------------------------------------------------------- The author would like to thank to Alejandro V. Mena, Elena Vera and Angélica Gutiérrez for their support to the social service program: "Desarrollo de Tecnologías del Habla." They also thank to the social service students for all the hard work. ------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------- For more information and documentation see the CIEMPIESS-UNAM Project website at: http://www.ciempiess.org/ ------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------